Apparatus and method for coding an information signal into a data stream, converting the data stream and decoding the data stream

ABSTRACT

More customization and adaptation of coded data streams may be achieved by processing the information signal such that the various syntax structures obtained by pre-coding the information signal are placed into logical data packets, each of which being associated with a specific data packet type of a predetermined set of data packet types, and by defining a predetermined order of data packet types within one access unit of data packets. The consecutive access units in the data stream may, for example, correspond to different time portions of the information signal. By defining the predetermined order among the data packet types it is possible, at decoder&#39;s side, to detect the borders between successive access units even when removable data packets are removed from the data stream on the way from the data stream source to the decoder without incorporation of any hints into the reminder of the data stream.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Divisional of U.S. patent application Ser. No.12/716,542, entitled APPARATUS AND METHOD FOR CODING AN INFORMATIONSIGNAL INTO A DATA STREAM, CONVERTING THE DATA STREAM AND DECODING THEDATA STREAM, filed Mar. 3, 2010, which is a Divisional of U.S. patentapplication Ser. No. 12/422,169, entitled APPARATUS AND METHOD FORCODING AN INFORMATION SIGNAL INTO A DATA STREAM, CONVERTING THE DATASTREAM AND DECODING THE DATA STREAM, filed Apr. 10, 2009, now U.S. Pat.No. 8,139,611, which claims priority to U.S. patent application Ser. No.10/788,776, entitled APPARATUS AND METHOD FOR CODING AN INFORMATIONSIGNAL INTO A DATA STREAM, CONVERTING THE DATA STREAM AND DECODING THEDATA STREAM, filed Feb. 27, 2004, now U.S. Pat. No. 7,586,924, all ofwhich are incorporated herein by this reference thereto.

BACKGROUND OF THE INVENTION

1. Technical Field of the Invention

The present invention relates to information signal coding schemes ingeneral and, in particular, to coding schemes suitable for single mediaor multimedia signal coding, such as video coding or audio coding.

2. Description of the Prior Art

The MPEG-2 video coding standard, which was developed about 10 years agoprimarily as an extension of prior MPEG-1 video capability with supportof interlaced video coding, was an enabling technology for digitaltelevision systems worldwide. It is widely used for the transmission ofstandard definition (SD) and High Definition (HD) TV signals oversatellite, cable, and terrestrial emission and the storage ofhigh-quality SD video signals onto DVDs.

However, an increasing number of services and growing popularity of highdefinition TV are creating greater needs for higher coding efficiency.Moreover, other transmission media such as Cable Modem, xDSL or UMTSoffer much lower data rates than broadcast channels, and enhanced codingefficiency can enable the transmission of more video channels or higherquality video representations within existing digital transmissioncapacities.

Video coding for telecommunication applications has evolved through thedevelopment of the MPEG-2 coding standard, and has diversified from ISDNand T1/E1 service to embrace PSTN, mobile wireless networks, andLAN/Internet network delivery. Despite this evolution, there is still aneed to maximize coding efficiency while dealing with thediversification of network types and their characteristic formatting andloss/error robustness requirements.

Recently, the MPEG-4 Visual standard has also begun to emerge in use insome application domains of the prior coding standards. It has providedvideo shape coding capability, and has similarly worked towardbroadening the range of environments for digital video use.

However, the video schemes available today have in common, that it isdifficult to adapt an already coded video stream during its way from itscreation to the arrival at a receiver in order, for example, to adaptthe performance level of the coded video stream to the performance ofthe receiver or to the performance of the transmission link connectingthe coded video streams source and the receiver.

For example, a MPEG-4 data stream may be provided at a video server inDolby surround, thus providing a relatively large number of audiochannels. However, the receiver may be a device capable of onlyreproducing mono-audio information. In this case, transferring thevideo-coded stream with full performance level, i.e. incorporating allaudio channels, would mean waste of transfer-linked capacity. Thus, itwould be advantageous if a gateway between the coded video stream sourceand the receiver could convert the coded video stream from its initialperformance level to a lower performance level. However, in availablevideo coding schemes, the gateway may not convert a video data streamfrom a higher performance level to a lower performance level merely bydiscarding the portion of the coded video data stream pertaining theexcessive channels without manipulating the reminder of the coded videostream, i.e. the portion concerning both the higher performance level aswell as the lower performance level.

Therefore, there is a need for a video coding scheme, which allows ahigher “network friendliness” to enable simple and effectivecustomization for a broad variety of systems. To be more specific, thevideo coding scheme should allow a greater customization of carrying thevideo content in a manner appropriate for each specific network.

Moreover, the customization and adaptation of coded video streams shouldbe possible with reasonable efforts.

SUMMARY OF THE INVENTION

It is the object of the present invention to provide an informationsignal coding scheme which enables more customization and adaptation ofthe coded data stream with reasonable efforts.

In accordance with a first aspect of the present invention, this objectis achieved by an apparatus for coding an information signal, theapparatus comprising means for processing the information signal inorder to obtain data packets, each data packet being of a data packettype of a predetermined set of data packet types, at least one of thedata packet types being a removable data packet type; and means forarranging the data packets into a data stream so that the data streamcomprises consecutive access units of consecutive data packets, so thatthe data packets within each access unit are arranged in accordance witha predetermined order among the data packet types, wherein the means forprocessing and the means for arranging are adapted so that even when adata packet of the removable data packet type is removed from the datastream, borders between successive access units are detectable from thedata stream by use of the predetermined order.

In accordance with a second aspect of the present invention, this objectis achieved by an apparatus for converting a data stream representing acoded version of an information signal from a first performance level toa second performance level, the data stream comprising consecutiveaccess units of consecutive data packets, each data packet being of adata packet type of a predetermined set of data packet types, at leastone of the data packet types being a removable data packet type, and thedata packets within each access unit being arranged in accordance to apredetermined order among the data packet types such that even when adata packet of the removable data packet type is removed from the datastream, borders between successive access units are detectable from thedata stream by use of the predetermined order, the apparatus comprisingmeans for removing at least one data block of the removable data packettype from the bit stream without manipulating the reminder of the datastream.

In accordance with a third aspect of the present invention, this objectis achieved by an apparatus for decoding a data stream representing acoded version of an information signal, the data stream comprisingconsecutive access units of consecutive data packets, each data packetbeing of a data packet type of a predetermined set of data packet types,at least one of the data packet types being a removable data packettype, and the data packet within each access unit being arranged inaccordance with a predetermined order among the data packet types, suchthat even when a data packet of the removable data packet type isremoved from the data stream, borders between successive access unitsare detectable from the data stream by use of the predetermined order,the apparatus comprising means for detecting a border between successiveaccess units by use of the predetermined order; and means for decodingthe successive access units.

In accordance with a forth aspect of the present invention, this objectis achieved by a method for coding an information signal, the methodcomprising processing the information signal in order to obtain datapackets, each data packet being of a data packet type of a predeterminedset of data packet types, at least one of the data packet types being aremovable data packet type; and arranging the data packets into a datastream so that the data stream comprises consecutive access units ofconsecutive data packets, so that the data packets within each accessunit are arranged in accordance with a predetermined order among thedata packet types, wherein the steps of processing and arranging areadapted so that even when a data packet of the removable data packettype is removed from the data stream, borders between successive accessunits are detectable from the data stream by use of the predeterminedorder.

In accordance with a fifth aspect of the present invention, this objectis achieved by a method for converting a data stream representing acoded version of an information signal from a first performance level toa second performance level, the data stream comprising consecutiveaccess units of consecutive data packets, each data packet being of adata packet type of a predetermined set of data packet types, at leastone of the data packet types being a removable data packet type, and thedata packets within each access unit being arranged in accordance to apredetermined order among the data packet types such that even when adata packet of the removable data packet type is removed from the datastream, borders between successive access units are detectable from thedata stream by use of the predetermined order, the method comprisingremoving at least one data block of the removable data packet type fromthe bit stream without manipulating the reminder of the data stream.

In accordance with a sixth aspect of the present invention, this objectis achieved by a method for decoding a data stream representing a codedversion of an information signal, the data stream comprising consecutiveaccess units of consecutive data packets, each data packet being of adata packet type of a predetermined set of data packet types, at leastone of the data packet types being a removable data packet type, and thedata packet within each access unit being arranged in accordance with apredetermined order among the data packet types, such that even when adata packet of the removable data packet type is removed from the datastream, borders between successive access units are detectable from thedata stream by use of the predetermined order, the method comprisingdetecting a border between successive access units by use of thepredetermined order; and decoding the successive access units.

In accordance with a sixth aspect of the present invention, this objectis achieved by a data stream representing a coded version of a video oraudio signal, the data stream comprising consecutive access units ofconsecutive data packets, each data packet being of a data packet typeof a predetermined set of data packet types, at least one of the datapacket types being a removable data packet type, and the data packetswithin each access unit being arranged in accordance with apredetermined order among the data packet types such that even when adata packet of the removable data packet type is removed from the datastream, borders between successive access units or detectable from thedata stream by use of the predetermined order.

The present invention is based on the finding that a customization andadaptation of coded data streams may be achieved by processing theinformation signal such that the various syntax structures obtained bypre-coding the information signal are placed into logical data packets,each of which being associated with a specific data packet type of apredetermined set of data packet types, and by defining a predeterminedorder of data packet types within one access unit of data packets. Theconsecutive access units in the data stream may, for example, correspondto different time portions of the information signal. By defining thepredetermined order among the data packet types it is possible, atdecoder's side, to detect the borders between successive access unitseven when removable data packets are removed from the data stream on theway from the data stream source to the decoder without incorporation ofany hints into the reminder of the data stream. Due to this, decoderssurely detect the beginnings and endings of access units and thereforeare not liable to a buffer overflow despite a removal of data packetsfrom the data stream before arrival at the decoder.

The removable data packets may be data packets which are negligible ornot necessary for decoding the values of the samples in the informationsignal. In this case, the removable data packets may contain redundantinformation concerning the video content. Alternatively, such removabledata packets may contain supplemental enhancement information, such astiming information and other supplemental data that may enhanceusability of the decoded information signal obtained from the datastream but are not necessary for decoding the values of the samples ofthe informations signal.

However, the removable data packets may also contain parameters sets,such as important header data, that can apply to a large number of otherdata packets. In this case, such removable data packets containinformation necessary for retrieval of the video content from the datastream. Therefore, in case of removal of such data packets, same aretransferred to the receiver in another way, for example, by use of adifferent transmission link or by inserting thus removed data packetsomewhere else into the data stream in accordance with the predeterminedorder among the data packet types in order not to accidentally create acondition in the data stream defining the beginning of a new access unitalthough being in the middle of an access unit.

Thus, it is an advantage of the present invention that an informationsignal may be coded into a data stream composed of consecutive datapackets, and that removable data packets may be removed from the datastream without having to manipulate the reminder of the data stream andwith, despite this, the order among the data packet types within accessunits being maintained so that boarders between successive access unitsare still derivable by use of the order, preferably merely by theknowledge of the order.

Moreover, another advantage of the present invention is the higherflexibility in arranging the data packets in the data stream as long asthe arrangement complies with the predetermined order among the datapacket types. This allows duplicating data packets for redundancyenhancement and purposes as well as adapting the performance level ofthe data stream to the receiving or transmission environment.

SHORT DESCRIPTION OF THE DRAWINGS

Preferred embodiment of the present invention are described in moredetail below with respect to the Figures.

FIG. 1 shows a schematic diagram illustrating a creation, conversion anddecoding of a data stream in accordance with an embodiment of thepresent invention.

FIG. 2 shows a block diagram of a system in which the procedures of FIG.1 may be realized in accordance with an embodiment of the presentinvention.

FIG. 3 shows a block diagram of an encoder environment in accordancewith an embodiment of the present invention.

FIG. 4 shows a schematic diagram illustrating the structure of a datastream in accordance with a specific embodiment of the presentinvention.

FIG. 5 shows a syntax diagram for illustrating the structure of anaccess unit in accordance with the specific embodiment of FIG. 4.

FIG. 6 shows a flow diagram for illustrating a possible mode ofoperation in the gateway of FIG. 2 in accordance with an embodiment ofthe present invention.

FIG. 7 shows a schematic diagram illustrating the parameter settransmission via an extra transmission link between encoder and decoderin accordance with an embodiment of the present invention.

FIG. 8 shows a flow diagram illustrating the operation of the decoder ofFIG. 2 in accordance with the specific embodiment of the presentinvention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE PRESENT INVENTION

Before describing preferred embodiments of the present invention withrespect to the figures, it is noted that like elements in the figuresare designated by like reference numbers, and that a repeatedexplanation of these elements is left-out.

FIG. 1 shows the creation, conversion and decoding of a data stream inaccordance with an embodiment of the present invention, the data streamrepresenting a coded version of an information signal, such as an audio,video or multi-media signal.

In FIG. 1, the information signal is indicated by reference number 10.Although the information signal 10 could be any time-domain ortime-dependent information signal, the information 10 is illustrated asa multimedia signal comprised of a video signal or video content 10 aand an audio signal or audio content 10 b. The video content 10 a isillustrated as being composed of a sequence of pictures 12, while theaudio signal 10 b is illustrated as comprising a sequence of audiosamples 14, the sequence extending along the time axis t.

Although the information signal 10 could be handled, such as stored andtransferred, in an un-coded digital manner, the information signal 10 isencoded in order to compress the information signal, i.e. to reduce theamount of data necessary in order to represent the information signal.This encoding process is indicated in FIG. 1 by arrow 16, while anencoder performing the encoding process 16 is indicated at 18 in FIG. 2which is also referred to in the following and which shows an examplefor a possible environment in which the processes of FIG. 1 could beemployed.

By the encoding process 16 a bit stream 20 is obtained. The bit stream20 is composed of a sequence of consecutive data packets 22, with thedata stream 20 being illustrated as an arrow. The direction of the arrowindicates which of the data packets 22 precedes which data packet 22 ofthe data stream 20. The data packets are indicated by individualrectangles inside the arrow 20 and are labeled by A-F. Each data packetis uniquely associated with one of a predetermined set of data packettypes, each data packet type being illustrated by A-F. The data packets22 are, for example, associated with a respective data packet type by atype number in a header of the data packets 22. Each data packet typewould by uniquely associated with a different type number.

Several consecutive data packets 22 are grouped into an access unit, asillustrated by braces 24. In this way, the data stream 20 is composed ofimmediately consecutive access units 24 which are themselves composed ofimmediately consecutive data packets 22.

Although access units 24 could have any meaning, in the following itwill be assumed that each access unit 24 belongs to a specific timeportion of the information signal 10. In the case of a multimediasignal, as illustrated at 10, each access unit 24 could, for example,represent a coded version of a specific of the pictures 12 and thecorresponding portion of the audio signal 14 of the information signal10.

As will be described in more detail below with respect to FIG. 3, theencoding process 16 could be composed of several steps. For example, asa first step the encoding process 16 could involve a pre-coding step inwhich samples of the information signal are pre-coded in order to obtainsyntax elements of various syntax element types, each syntax elementeither applying to a portion of one picture 12 or a portion of the audiosignal 14, to a whole picture 12 or to a sequence of pictures 12. As asecond step, the encoding process 16 could then involve a step ofgrouping syntax elements being of the like syntax element type andapplying to the same pictures 12 to obtain the individual data packets22. In a further, last step, these data packets 22 would then bearranged in a sequence in order to obtain the data stream 20 thecharacteristics if which will be described in more detail below.

In the following, the encoding process 16 is assumed to be optimized inorder to achieve a high-performance level coded version of theinformation signal 10. In other words, the encoding process 16 isassumed to be adjustable in the sense that the encoding process creates,beside others, syntax elements and corresponding data packets 22 whichare not essential or absolutely necessary for retrieval of theinformation signal from the resulting data stream 20. In particular, itis assumed that the encoder 18 creates a data stream 20 being composedof the data packets of all possible or envisaged data packet types A-F.Of course, due to the high-performance level of the data stream 20, sameinvolves a greater amount of data than a data stream of alower-performance level.

As shown in FIG. 2, it is assumed that the data stream 20 is firstlystored in a store 26 such as a video server or the like, with which theencoder 18 is connected. Now, in order to enable the transmission of thedata stream 20 to a receiver 28 via a transmission link 30 in anefficient way, a gateway 32 is connected between the store 26 and thereceiver 28, and preferably between the store 26 and the transmissionlink 30. This gateway 32 performs an adaptation or conversion of thedata stream 20 from the high-performance level as it is provided in theserver 26 to a lower performance level which is adapted to the capacityand performance of the transmission line 30 and receiver 28,respectively. For example, the transmission link 30 may be atransmission link with a very low error bit rate. In this case, thegateway 32 would convert the data stream 20 into a data stream havingless or no redundancy information.

In order to enable this conversion which is illustrated in FIG. 1 by anarrow 34, in an effective and very simple way, the encoding process 16is performed such that the data packets 22 within one access unit 24 arearranged in accordance with a predetermined order among the data packettypes A-F. For illustrating purposes only, it is assumed in FIG. 1 thatthe predetermined order among the data packet types A-F is equivalent tothe alphabetical order. Thus, as can be seen from FIG. 1, in each accessunit 24 the consecutive data packets 22 are arranged in alphabeticalorder with respect to their type. It is emphasized, that there ispossibly more than one data packet of a specific data package type in anaccess unit, although such circumstances are not depicted in FIG. 1, andthat the order among such data packets of the same data packet type mayor may not be prescribed by a predetermined ordering rule. Moreover,even though it is assumed that the present data stream 20 is of highestperformance level, there may exist access units 24 in the data stream 20which do not contain data packets of all the data packet types A-F,although such an access unit is not shown in FIG. 1. Moreover, it isnoted that for the purpose of enabling adaptation and converting thedata stream in a simple way, a more relaxed predetermined order amongthe data packet types A-F could be sufficient as will be described inthe following with respect to FIGS. 4 to 8. To be more precise, it isnot necessary that the predetermined order is such strict that each datapacket type is fixed to a position in front of all other data packettypes, between two other data packet types or after all other datapacket types. Rather, it could be sufficient if the predetermined ordercontains just one or more ordering rules such as “data packets of theremovable data packet type X (X=A, . . . , F) have to precede or succeeddata packets of data packet type Y (Y≠X and Y=A, . . . , F)”. Inparticular, it would be possible that instead of the strict alphabeticorder, the predetermined order could allow the mixing-up of data packetsof the data packet types C and D, for example.

Due to the prescribed order among the data packet types A-F, the gateway32 can convert the data stream 20 having a high-performance level to adata stream 36 having a lower performance level merely by removing someof the removable data packet types which, for example, contain redundantpicture information or supplemental enhancement information which is notnecessary for retrieval of the pictures 12 or audio signal 14 from thedata stream 20. Moreover, the removed data packets of the removable datapacket types could as well concern essential information. In this case,the gateway 32 would, for example, transmit this information of thesedata packets via a different transmission link to the receiver 28 aswill be described in more detail below.

As can be seen in FIG. 1, it is assumed that in the conversion process34 performed by gateway 32 all data packets 22 of the data packet typesA, B, and E have been removed from the data stream 20 in order to obtaina shortened data stream 36. As can easily be understood, the bordersbetween successive access units can still easily be detected in the datastream 36 by means of the predetermined order: Each time a data packetof a specific data packet type X precedes a data of a data packet type Ythat, in accordance with the predetermined order, precedes the datapacket type X of the preceding data packet, between these data packetstwo successive access units 38 abut or a border between two successiveaccess 38 units exists. In the exemplary case of FIG. 1, this conditionapplies all times when the data packet of the data packet type Fprecedes a data packet of the data packet type C. Thus, the extension ofeach access unit 38 in the converted data stream 36 can still easily beobtained at decoder's side by use of the knowledge of the predeterminedorder among data packet types even though, at decoder's side, it isunknown which if the removable data packet types have been removed.Thus, each access unit of the access units in the converted data stream36 which are indicated by braces 38 corresponds with one of the accessunits 24 in the data stream 20. In particular, the access units 24 andaccess units 38 are equal in number and order. Moreover, since theborders between successive access units are detectable even in themodified data stream 36 and are arranged at the same places, removal ofdata packets merely results reducing the size of access units 38 of datastream 36 relative to the access units 24 in data stream 22.

After transmission of the data stream 36 via a transmission link 30 toreceiver 28, the converter data stream 36 is decoded at the receiver 28in a decoding process 40. The receiver 28 may decode the data stream 36solely by use of the data stream itself if the data packets removed atthe converting process 34 merely contained information not beingnecessary for retrieval of the original information signal 10. In theother case, the receiver 28 decodes the converted data stream 36 basedon information contained in the data packets having been removed in theconverting process 34 and having been transmitted to receiver 28 via anextra transmission link, for example.

The result of the decoding process 40 is a decoded information signal 42in a quality as it would be obtained by directly decoding the datastream 20. Alternatively, the quality of the decoded information signal42 is somewhat reduced in comparison to the quality of a decodedinformation signal as obtained directly by decoding data stream 20.

To summarize, by defining the predetermined order among the data packettypes, it is possible not only to maintain the correspondence betweenaccess units in the original data stream 20 and the access units 38 inthe converted data stream 36 but also to enable the receiver 28 toassociate each data packet with the access unit it originally belongedto in the original data stream 20. The latter guaranties that a receiver28 buffering the incoming data packets and emptying the buffer in unitsof access units is not liable to a buffer overflow as will be describedin more detail below.

In the following, a specific embodiment of the present invention will bedescribed in view of a video signal as the information signal. In thefollowing, reference will also be made to FIG. 2, in order to illustratethe following specific embodiment in view of an exemplary applicationenvironment.

FIG. 3 shows an embodiment of an encoder 18 for encoding a video signalinto a data stream. The encoder 18 comprises a precoder 50, an encoder52 and an arranging unit 54 all being connected in series between aninput 56 and an output of the encoder 18. At the input 56 the encoder 18receives the video signal, wherein in FIG. 3 illustratively one picture12 of the video signal is shown. All pictures of the video signal arecomposed of a plurality of pixels or picture samples arranged in rowsand columns.

The video signal or pictures 12 are fed via input 56 to the videoprecoder 50. The video precoder 50 treats the pictures 12 in units ofso-called macroblocks 12 a, i.e. a block of, for example, 4×4 pixelsamples. On each macroblock 12 a precoder 50 performs a transformationinto spectral transformation coefficients followed by a quantizationinto transform coefficient levels. Moreover, intra-frame prediction ormotion-compensation is used in order not to perform the afore-mentionedsteps directly on the pixel data but on the differences of same topredicted pixel values, thereby achieving small values which may moreeasily be compressed.

The macroblocks into which the picture 12 is partitioned are groupedinto several slices. For each slice a number of syntax elements aregenerated which form a coded version of the macroblocks of the slice.For illustration purposes, in FIG. 3 the picture 12 is shown as beingpartitioned into three slice groups or slices 12 b.

The syntax elements output by precoder 50 are dividable into severalcategories or types. The encoder 52 collects the syntax elements of thesame category and belonging to the same slice of the same picture 12 ofa sequence of pictures and groups them to obtain data packets. Inparticular, in order to obtain a data packet, the encoder 52 forms acompressed representation of the syntax elements belonging to a specificdata packet to obtain payload data. To this payload data encoder 52attaches a type number indicating the data packet type to obtain a datapacket. The precoder 50 and the encoder 52 of the encoder 18 form aso-called video coding layer (VCL) for efficiently representing thevideo content.

The data packets output by encoder 52 are arranged into a data stream byarranging unit 55 as will be described in more detail with respect toFIG. 4. The arranging unit 55 represents the network abstraction layer(NAL) of encoder 18 for formatting the VCL representation of the videoand providing header information in a manner appropriate for conveyanceby a variety of transport layers of a storage media.

The structure of the data stream output by encoder 18 of FIG. 3 isdescribed in more detail below with respect to FIG. 4. In FIG. 4, thedata stream output at output 58 is shown at 70. The data stream 70 isorganized in consecutive blocks 72 of coded video sequences ofconsecutive pictures of a video. The coded video sequence blocks 72internally consist of a series of access units 74 that are sequential inthe data stream 70. Each coded video sequence 72 can be decodedindependently of any other coded video sequence 72 from the data stream70, given the necessary parameter set information, which may be conveyed“in-band” or “out-of-band” as will be described in more detail below.Each coded video sequence 72 uses only one sequence parameter set.

At the beginning of a coded video sequence 72 is an access unit 74 of aspecial type, called instantaneous decoding refresh (IDR) access unit.An IDR access unit contains an intra picture, i.e. a coded picture thatcan be decoded without decoding any previous pictures in the data stream70. The presence of an IDR access unit in the data stream 70 indicatesthat no subsequent picture in the stream 70 will require reference topictures prior to the intra picture it contains in order to be decoded.The data stream 70 may contain one or more coded video sequences 72.

An access unit 74 is a set of NAL units 76 in a specified form, thespecified form being explained in more detail below. The decoding ofeach access unit 74 results in one decoded picture. In the following,the data stream 70 is also sometimes called NAL unit stream 70.

The NAL units 76 correspond with the data packets mentioned above withrespect to FIG. 3. In other words, the coded video data is organized byencoder 52 in NAL units 76. Each NAL unit 76 is effectively a packetthat contains an integer number of bytes. The first byte of each NALunit is a header byte 78 that contains an indication of the type of datain the NAL unit, and the remaining bytes contain payload 80 data of thetype indicated by header 78.

The payload data 80 in the NAL units 76 may be interleaved, asnecessary, with emulation prevention bytes. Emulation prevention bytesare bytes inserted with a specific value to prevent a particular patternof data called a start co-prefix from being accidentally generatedinside the payload.

The NAL unit structure definition specifies a generic format for use inboth packet-oriented and bit stream-oriented transport systems, at aseries of NAL units generated by an encoder as referred to as the NALunit stream 70.

For example, some systems require delivery of the entire or partial NALunit stream 70 as an ordered stream of bytes or bits within which thelocations of NAL unit boundaries 82 need to be identifiable frompatterns with the coded data itself.

For use in such systems, encoder 18 creates data stream 70 in a bytestream format. In the byte stream format each NAL unit 76 is prefixed bya specific pattern of, for example, three bytes, called a start codeprefix. This start code prefix is not shown in FIG. 4 since it isoptionally. If present, the start code prefix within an NAL unitprecedes the header byte 78. The boundaries of the NAL 76 can then beidentified by searching the coded data for the unique start code prefixpattern. Moreover, the NAL data stream output by encoder 18 of FIG. 3may be interleaved by emulation prevention bytes within the payload datablocks 80 of the NAL units 76 in order to guarantee that start codeprefixes are unique identifiers of a start of a new NAL unit 76. A smallamount of additional data (one byte per video picture) may also be addedto allow decoders that operate in systems that provide streams of bitswithout alignment to byte boundaries to recover the necessary alignmentfrom the data in the stream.

Additional data could also be inserted into the byte stream format thatallows expansion of the amount of data to be sent and can aid inachieving more rapid byte alignment recovery, if desired.

In other systems, like internet protocol or RTP systems, the coded dataor data stream 70 is carried in packets that are framed by the systemtransport protocol, an identification of the boundaries of NAL unitswithin the packets can be established without use of start code prefixpatterns. In such systems, the inclusion of start code prefixes in thedata of NAL units 76 would be a waste of data-carrying capacity, soinstead the NAL units 76 can be carried in data packets without startcode prefixes.

NAL units are classified into VCL and non-VCL NAL units. The VCL NALunits contain the data that represents the values of the samples in thevideo pictures 12 and are, therefore, necessary for decoding, and thenon-VCL NAL units contain any associated additional information such asparameter sets, i.e. important header data that can apply to a largenumber of VCL NAL units, and supplemental enhancement information, suchas timing information and other supplemental data that may enhanceusability of the decoded video signal (42 in FIG. 1) but are notnecessary for decoding the values of the samples in the video pictures12.

A parameter set is supposed to contain information that is expected torarely change and offers the decoding of a large number of VCL NALunits. There are two types of parameter sets:

-   -   sequence parameter sets, which apply to a series of consecutive        coded video pictures called a coded video sequence, and    -   picture parameter sets, which apply to the decoding of one or        more individual pictures 12 within a coded video sequence 72.

The sequence and picture parameter set mechanism which is described inmore detail below decouples the transmission of infrequently changinginformation from the transmission of coded representations of the valuesof the samples in the video pictures 12. Each VCL NAL unit 76 containsin its payload data portion 80 an identifier that refers to the contentof the relevant picture parameter set, and each picture parameter setnon-VCL NAL unit contains in its payload data portion 80 an identifierthat refers to the content of the relevant sequence parameter set. Inthis manner, a small amount of data, i.e. the identifier, can be used torefer to a larger amount of information, i.e. the parameter set, withoutrepeating that information within each VCL NAL unit.

Sequence and picture parameter sets can be sent well ahead of the VCLNAL units that they apply to, and can be repeated to provide robustnessagainst data loss, as will be described in more detail below. In someapplications, parameter sets may be sent within the channel that carriesthe VCL NAL units termed “in-band” transmission. In other applications,it can be advantageous to convey the parameter sets “out-of-band” usinga more reliable transport mechanism or transmission link than the videochannel for transmitting the NAL data stream 70 itself as will bedescribed in the following with respect to FIGS. 6 and 7.

Now, before explaining in detail the predetermined order among the NALunit types in accordance with the present embodiment, in the followingthe different NAL unit types are listed in Table 1 below along withtheir associated NAL unit type number for reasons of completeness.

TABLE NAL units Nal unit type Content of NAL unit and RBSP syntaxstructure C 0 Unspecified 1 coded slice of a non-IDR picture 2, 3, 4slice_layer_without_partitioning_NAL unit( ) 2 Coded slice datapartition A 2 slice_data_partition_a_layer_NAL unit( ) 3 Coded slicedata partition B 3 slice_data_partition_b_layer-_NAL unit( ) 4 Codedslice data partition C 4 slice_data_partition_c_layer_NAL unit( ) 5Coded slice of an IDR picture 2, 3 slice_layer_without_partitioning_NALunit( ) 6 Supplemental enhancement information (SEI) 5 sei_NAL unit( ) 7Sequence parameter set 0 seq_parameter_set_NAL unit( ) 8 Pictureparameter set 1 pic_parameter_set_NAL unit( ) 9 Access unit delimiter 6access_unit_delimiter_NAL unit( ) 10  End of sequence 7 end_of_seq_NALunit( ) 11  End of stream 8 end_of_stream_NAL unit( ) 12  Filler data 9filler_data_NAL unit( ) 13 . . . 23 Reserved 24 . . . 31 Unspecified

As can be seen from Table 1, NAL units 76 having a NAL unit type 1 asits header byte 78 belong to one of the non-IDR access units, i.e. oneof the access units 74 which succeed the first access unit of each codedvideo sequence 72, which is the IDR access unit as mentioned before.Moreover, as indicated in Table 1, a NAL unit 76 of NAL unit type 1represent coded versions of a slice of a non-IDR picture, i.e. a pictureother than the first picture of a coded video sequence 72. As is shownin the last column of Table 1, in NAL units 76 of NAL unit type 1 syntaxelements of categories C=2, 3 and 4 are combined.

At the side of the encoder, it may have been decided not to combine thesyntax elements of category 2, 3 and 4 of one slice in one common NALunit 76. In this case, partitioning is used in order to distribute thesyntax elements of different categories 2, 3 and 4 to NAL units ofdifferent NAL unit types, i.e. NAL unit type 2, 3 and 4 for categories C2, 3 and 4, respectively. To be more specific, partition A contains allsyntax elements of category 2. Category 2 syntax elements include allsyntax elements in the slice header and slice data syntax structuresother than the syntax elements concerning single transform coefficients.Generally spoken, partition A syntax elements as contained in NAL unitsof the NAL unit type 2 are more important than the syntax elementscontained in NAL units 76 of NAL unit type 3 and 4. These latter NALunits contain syntax elements of category 3 and 4, which include syntaxelements concerning transform coefficients.

As can be seen, slice data partitioning is not possible within the firstpicture of a video sequence so that coded versions of slices of an IDRpicture are conveyed by NAL units 76 of a NAL unit type 5.

NAL units 76 of NAL unit type 6 contain in its payload data portion 80supplemental enhancement information (SEI) with the afore-mentionedexamples.

NAL units 76 of NAL unit type 7 contain in its payload data 80 asequence parameters set, while NAL units 76 of NAL unit type 8 containin its payload data 80 a picture parameter set.

NAL units 76 of NAL unit type 9 are called an access unit delimiter andindicate the beginning of an access unit. As it will turn out from thefollowing description, access unit delimiter are optional and notnecessary for parsing of the NAL data stream 70.

NAL units of NAL unit types 10 and 11 are NAL units indicating the endof a sequence and the end of the whole data stream, respectively. NALunits 76 of NAL unit type 12 contain in its payload portion 80 fillerdata as may be necessary for some networks. NAL unit types 13 to 23 and24 to 31 pertain reserved or unspecified NAL unit types for specificapplications.

Now, after having described rather broadly the structure of the NAL unitstream 70 generated by the encoder 18 of FIG. 3, the constrains on theorder of the NAL units 76 in the bit stream 70 are described in moredetail with reference to Table 1 and FIG. 4. Any order of NAL units 76and the data or bit stream 70 obeying the below mentioned constrainsare, in accordance with the present embodiment of the present invention,in conformity with parsing rules used by a decoder of interest in orderto retrieve the coded information, i.e. the video signal. Decoders usingthat parsing rules shall be capable of receiving NAL units 76 in thisparsing or decoding order and retrieving the syntax elements.

In the following, the positioning of sequence and picture parameter setNAL units, i.e. NAL units of NAL unit type 7 and 8, is specified first.Then, the order of access units 74 is specified. Then, the order of NALunit 76 and coded pictures 12 and their association to access units 74is specified. Finally, the order of VCL NAL units and association tocoded pictures is described.

As mentioned before, NAL units 76 are classified into VCL and non-VCLNAL units. The VCL NAL units contain the data that represent the valuesof the samples and the video pictures, and the non-VCL NAL units containany associated additional information such as parameter sets andsupplemental enhancement information, such as timing information andother supplemental data that may enhance usability of the decoded videosignal but are not necessary for decoding the values of the samples andthe video pictures. With reference to Table 1, which specifies the typeof RBSP data structure contained in the NAL unit 76, VCL NAL units arespecified as those NAL units having NAL_unit_type=1, 2, 3, 4, 5 or 12,all remaining NAL units are called non-VCL NAL units.

The NAL units having NAL unit type other than 1-5 and NAL units havingNAL unit type 1-5 and, concurrently, having a syntax element indicatingthat they are concerning redundant pictures are removable NAL units.

In the following, the payload data 80 is sometimes called Raw BataSequence Payload or RBSP. The RBSP 80 is a syntax structure containingan integer number of bytes that is encapsulated in a NAL unit 76. AnRBSP is either empty or has the form of a string of data bytescontaining syntax elements followed by an RBSP stop bit and followed bya zero and more subsequent bytes equal to zero.

In this way, a NAL unit 76 is a syntax structure containing anindication of the type of data to follow, i.e. the header byte 78, andbytes 80 containing the data in the form of an RBSP interspersed asnecessary with emulation prevention bytes as already noted above.

On the other hand, an access unit 74 represents any primary codedpicture, zero or more corresponding redundant coded pictures, and zeroor more non-VCL NAL units. The association of VCL NAL units to primaryor redundant coded pictures or access units is described below.

In order to allow the removal of removable NAL units 76 from data stream70 with remaining the decoding or parsing order, the format of theaccess unit 74 is like shown in FIG. 5. The NAL units 76 that can beremoved are all types except VCL NAL units of a primary coded picture,i.e. all NAL unit types except NAL unit types 1 to 5.

As shown in FIG. 5, each access unit contains in any case a set of VCLNAL units that together compose a primary coded picture 100. An accessunit may be prefixed with an access unit delimiter 102, i.e. a NAL unithaving NAL_unit_type 9 to 8 to aid in locating the start of the accessunit 74. Some supplemental enhancement information SEI in form of SEINAL units of NAL unit type 6 containing data such as picture timinginformation may also precede the primary coded picture 100, this SEIblock being indicated by reference number 104.

The primary coded picture consists of a set of VCL NAL units 76consisting of slices or sliced data partitions that represent samples ofthe video picture.

Following the primary coded picture 100 may be some additional VCL NALunits that contain redundant representations of areas of the same videopicture. These are referred to as redundant coded pictures 106, and areavailable for use by a decoder in recovering from loss or correction ofthe data in the primary coded pictures 100. Decoders are not required todecode redundant coded pictures if they are present. Finally, if thecoded picture the access unit 74 is associated with is the last pictureof a coded video sequence 72, wherein a sequence of pictures isindependently decodable and uses only one sequence parameter set, an endof sequence NAL unit 108 may be present to indicate the end of thesequence 72. And if the coded picture is the last coded picture in theentire NAL unit stream 70, an end of stream NAL unit 110 may be presentto indicate that the stream 70 is ending.

FIG. 5 shows the structure of access units not containing any NAL unitswith NAL_unit_type=0,7,8 or in the range of to 31, inclusive. The reasonfor having limited the illustration of access units to cases where NALunits of the aforementioned have been removed is, that, as already notedabove, sequence and picture parameter sets in NAL units of NAL unit type7 and 8 may, in some applications, be conveyed “out-of-band” using areliable transport mechanism or, in an redundant manner, in-band. Thus,an encoder 18 may output the sequence and picture parameter sets in-bandi.e. in the data stream 70, or out-of-band i.e. using an extra outputterminal.

Anyway, the encoder 18 or any means in between the encoder 18 and thedecoder 28 has to guarantee that the following constrains on the orderof sequence and parameter set RBSPs and their activation are obeyed.

A picture parameter set RBSP includes parameters that can be referred toby decoded slice A NAL units or coded slice data partition NAL units ofone or more coded pictures.

-   I) When a picture parameter set RBSP having a particular value of    PIC_parameter_set_id, i.e. the header byte 78, is referred to by a    coded slice NAL unit or coded slice data partition A NAL unit using    that value of PICparameter_set_id, it is activated. This picture    parameter set RBSP is called the active picture parameter set RBSP    until it is deactivated by the activation of another picture    parameter set RBSP. Picture parameter set RBSP, with that particular    value of PIC_parameter_set_id, shall be available to the decoding    process at decoder 28 prior to its activation. Thus, the encoder 18    has to take this into account when transmitting sequence and picture    parameter set in-band or out-of-band.

Any picture parameter set NAL unit containing the value ofpic_parameter_set_id for the active picture parameter set RBSP shallhave the same content as that of the active picture parameter set RBSPunless it follows the last VCL NAL unit of a coded picture and precedesthe first VCL NAL unit of another coded picture.

A sequence parameter set RBSP includes parameters that can be referredto by one or more picture parameter set RBSPs or one or more SEI NALunits containing a buffering period SEI message.

-   II) When a sequence parameter set RBSP (with a particular value of    seq_parameter_set_id) is referred to by activation of a picture    parameter set RBSP (using that value of seq_parameter_set_id) or is    referred to by an SEI NAL unit containing a buffering period SEI    message (using that value of seq_parameter_set_id), it is activated.    This sequence parameter set RBSP is called the active sequence    parameter set RBSP until it is deactivated by the activation of    another sequence parameter et RBSP. A sequence parameter set RBSP,    with that particular value of seqparameter_set_id, shall be    available to the decoding process prior to its activation. An    activated sequence parameter set RBSP shall remain active for the    entire coded video sequence.

Any sequence parameter set NAL unit containing the value ofseqparameter_set_id for the active sequence parameter set RBSP shallhave the same content as that of the active sequence parameter set RBSPunless it follows the last access unit of a coded video sequence andprecedes the first VCL NAL unit and the first SEI NAL unit containing abuffering period SEI message (when present) of another coded videosequence.

In the following, the order of NAL units and coded pictures and theirassociation to access units is described in more detail as before withreference to FIG. 5.

An access unit 74 consists of one primary coded picture 100, zero ormore corresponding redundant coded pictures 106, and zero or moreone-VCL NAL units 102, 104, 108 and 110, as already mentioned above.

The association of VCL NAL units to primary or redundant coded picturesis described below.

The first of any of the following NAL units 76 after the last VCL NALunit of a primary coded picture 100 specifies the start of a new accessunit.

-   a) Access unit delimiter NAL unit (NAL unit type 9) (when present)-   b) sequence parameter set NAL unit (NAL unit type 7) (when present)-   c) picture parameter set NAL unit (NAL unit type 8) (when present)-   d) SEI NAL unit (NAL unit type 6) (when present)-   e) NAL units with nal_unit_type in the range of 13 to 18, inclusive-   f) first VCL NAL unit of a primary coded picture (NAL unit type 1-5)    (always present)

The constraints for the detection of the first VCL NAL unit of a primarycoded picture are specified further below and can be used given theabove claimed restriction to distinguish access units even if NAL unitsthat are allowed to be removed are removed. The NAL units that can beremoved are all types except VCL NAL unit of a primary coded picture.

The following constraints shall be obeyed by the order of the codedpictures and non-VCL NAL units within an access unit.

-   g) When an access unit delimiter NAL unit (NAL unit type 9) is    present, it shall be the first NAL unit. There shall be at most one    access unit delimiter NAL unit in any access unit.-   h) When any SEI NAL units (NAL unit type 6) are present, they shall    precede the primary coded picture.-   i) When an SEI NAL unit containing a buffering period SEI message    shall be the first SEI message payload of the first SEI NAL unit in    the access unit, wherein a buffering period SEI NAL unit is for    controlling the buffering management at decoder's side.-   j) The primary coded picture (consisting of NAL units of NAL unit    types 1-5 and having redundant picture count value being equal to    zero) shall precede the corresponding redundant coded pictures.-   k) When redundant coded pictures (consisting of NAL units of NAL    unit types 1-5 and having redundant picture count value being not    equal to zero) are present, they shall be ordered in ascending order    of the value of redundant picture count value redundant_pic_cnt.-   l) When an end of sequence NAL unit (NAL unit type 10) is present,    it shall follow the primary coded picture and all redundant coded    pictures (if any).-   m) When an end of stream NAL (NAL unit type 11) unit is present, it    shall be the last NAL unit.-   n) NAL units having nal_unit_type equal to 0, 12, or in the range of    19 to 31, inclusive, shall not precede the first VCL NAL unit of the    primary coded picture.-   o) Sequence parameter set NAL units or picture parameter set NAL    units may be present in an access unit, but cannot follow the last    VCL NAL unit of the primary coded picture within the access unit, as    this condition would specify the start of a new access unit (see    constraint b)).-   p) When a NAL unit having nal_unit_type equal to 7 or 8 is present    in an access unit, it may not be referred to in the coded pictures    of the access unit in which it is present, and may be referred to in    coded pictures of subsequent access units.

In the following, the order of VCL NAL units and the association tocoded pictures is described in more detail below.

-   q) Each VCL NAL unit is part of a coded picture.-   r) The order of the VCL NAL units within a coded IDR picture is    constrained as follows.    -   If arbitrary slice order is allowed as specified by a certain        syntax element, coded slice of an IDR picture NAL units may have        any order relative to each other.    -   Otherwise (arbitrary slice order is not allowed), the order of        coded slice of an IDR picture NAL units shall be in the order of        increasing macroblock address for the first macroblock of each        coded slice of an IDR picture NAL unit.-   s) The order of the VCL NAL units within a coded non-IDR picture is    constrained as follows.    -   If arbitrary slice order is allowed as specified by a specific        syntax element, coded slice of a non-IDR picture NAL units or        coded slice data partition A NAL units may have any order        relative to each other. A coded slice data partition A NAL unit        with a particular value of slice_id shall precede any present        coded slice data partition B NAL unit with the same value of        slice_id. A coded slice data partition A NAL unit with a        particular value of slice_id shall precede any present coded        slice data partition C NAL unit with the same value of slice_id.        When a coded slice data partition B NAL unit with particular        value of slice_id is present, it shall precede any present coded        slice data partition C NAL unit with the same value of slice_id.    -   Otherwise (arbitrary slice order is not allowed), the order of        coded slice of a non-IDR picture NAL units or coded slice data        partition A NAL units shall be in the order of increasing        macroblock address for the first macroblock of each coded slice        of a non-IDR picture NAL unit or coded slice data partition A        NAL unit. A coded slice data partition A NAL unit with a        particular value of slice_id shall immediately precede any        present coded slice data partition B NAL unit with the same        value of slice_id. A coded slice data partition A NAL unit with        a particular value of slice_id shall immediately precede any        present coded slice data partition C NAL unit with the same        value of slice_id, when a coded slice data partition B NAL unit        with the same value of slice_id is present; it shall immediately        precede any present coded slice data partition C NAL unit with        the same value of slice_id-   t) NAL units having nal_unit_type equal to 12 may be present in the    access unit but shall not precede the first VCL NAL unit of the    primary coded picture within the access unit.-   u) NAL units having nal_unit_type equal to 0 or in the range of 24    to 31, inclusive, which are unspecified, may be present in the    access unit but shall not precede the first VCL NAL unit of the    primary coded picture within the access unit.-   v) NAL units having nal_unit_type in the range of 19 to 3,    inclusive, which are reserved, shall not precede the first VCL NAL    unit of the primary coded picture within the access unit.

The creation of the data stream 70 is further restricted by thefollowing constraints in order to enable the detection of the first VCLNAL unit of a primary coded picture:

-   w) Any coded slice NAL unit or coded slice data partition A NAL unit    of the primary coded picture of the current access unit shall be    different from any coded slice NAL unit or coded slice data    partition A NAL unit of the primary coded picture of the previous    access unit in one or more of the following ways.    -   frame num differs in value. frame_num is an identifier in each        VCL NAL unit indicating the picture 12 of the video 10 a it        belongs to. A value of frame_num may be assigned to more than        one picture or access unit, but the value of frame_num in the        payload data of the VCL NAL unit of successive access units 74        may not be the same. In other words, frame_num is used as a        unique identifier for each short-term reference frame. For        example, when the current picture is an IDR picture, frame_num        shall be equal to zero.    -   field_pic_flag differs in value. field_pic_flag as contained in        the payload data 80 of VCL NAL units specifies, if equal to one,        that the slice is associated to a coded field, i.e. a field of        an interlaced frame, and, if equal to zero specifies that the        picture which the VCL NAL unit having that field_pic_flag is a        coded frame, i.e. a coded interleaved or coded progressive        frame.    -   bottom_field_flag is present in both and differs in value.        bottom_field_flag as contained in the payload data 80 of a VCL        NAL unit specifies, if equal to one, that the slice is        associated to a coded bottom field, whereas bottom_field_flag        equal to zero specifies that the picture is a coded top field.        To be more specific, a coded video sequence consists of a        sequence of coded pictures, wherein a coded picture may        represent either an entire frame or a single field. Generally, a        frame of video can be considered to contain two interleaved        fields, a top and a bottom field. The top field contains        even-numbered rows, whereas the bottom field contains the        odd-numbered rows, for example. Frames in which the two fields        of a frame are kept at a different time instance, are referred        to as interlaced frames. Otherwise, a frame is referred to as a        progressive frame.    -   nal_ref_idc differs in value with one of the nal_ref_idc values        being equal to 0. nal_ref_idc is an identifier that may be        contained in a payload data 80 of a NAL unit. nal_ref_idc not        equal to zero specifies that the content of the NAL unit        contains a sequence parameter set or a picture parameter set or        a slice of a reference picture or a slice data partition of a        reference picture. Therefore, nal_ref_idc equal to zero for a        NAL unit containing a slice or slice data partition indicates        that a slice or slice data partition is part of a non-reference        picture. Any nal_ref_idc shall not be equal to zero for a        sequence parameter set or a picture parameter set in a NAL unit.        If nal_ref_idc is equal to zero for one slice or slice data        partition in a NAL unit of a particular picture, it shall be        equal to zero for all slice and slice data partition NAL units        of the picture. nal_ref_idc is, therefore, not equal to zero for        IDR NAL units, i.e. NAL units with a nal_unit_type equal to 5. A        nal_ref_idc is equal to zero for all NAL units having an        nal_unit_type equal to 6, 9, 10, 11 or 12.        Picture_order_cnt_type is an syntax element contained in payload        data 80 in order to specify the method to code the syntax        element picture_order_count. The value of pic_order_cnt_type        shall be in the range of 0 to 2, inclusive. pic_order_cnt_type        shall not be equal to 2 in a sequence that contains two or more        consecutive non-reference frames, complementary non-reference        field pairs or non-paired non-reference fields in decoding        order. pic_order_cnt_lsb specifies, when contained in a payload        data 80 of a VCL NAL unit, the picture order count coded for the        field of a coded frame or for a coded field. An IDR picture        shall, for example, have pic_order_cnt_lsb equal to zero.        Data_pic_order_cnt_bottoms is a syntax element that specifies,        when contained in a payload data 80 of a VCL NAL unit, the        picture order count difference from the expected picture order        count for the top field in a coded frame of a coded field.    -   frame_num is the same for both and pic_order_cnt_type is equal        to 1 for both and either delta_pic_order_cnt[0] differs in        value, or delta_pic_order_cnt[1]differs in value.        pic_order_cnt[0] specifies the picture order count difference        from the expected picture order count for the top field in a        coded frame or for a coded field. delta_pic_order_cnt[1]        specifies the picture order count difference from the expected        picture order count for the bottom field and the coded frame.    -   nal_unit_type is equal to 5 for both and idr_pic_id differs in        value. idr_pic_id is a syntax element contained in payload data        80 of IDR picture in a VCL NAL unit and indicates an identifier        for different IDR pictures of different coded video sequences        72.

After having described an embodiment for an encoder 18 and itsconstraints for creation of a data stream 70, in the following there isdescribed a possible functionality of a gateway 32 suitable for parsingthe data stream 70 of encoder of FIG. 3 to a receiver 28.

The gateway 32 receives the data stream 70 NAL unit-wise at step 120. Atstep 122, the gateway 32 investigates the type number, i.e.nal_unit_type, of the current NAL unit 76 just received in order todetermine at step 124 as to whether this NAL unit is of a NAL unit typeto be removed. For example, the NAL data stream 70 is of highperformance and has several redundant coded pictures 106. In this case,it could be, that gateway 32 decides to lower the redundancy level ofthe data stream 70 and removes all NAL units 76 from data stream 70having NAL unit types 1 to 5 and concurrently having a syntax element inthe payload data called redundant_pic_cnt being different to 0, whereinredundant_pic_cnt, in accordance with the present embodiment, equal to 0indicates slice and slice data partitions belonging to the primary codedpicture of an access unit. The reduction in redundancy is advantageousif the transmission link 30 between gateway 32 and receiver 28 has a lowbit error rate.

Alternatively, gateway 32 decides to transmit sequence and pictureparameter set NAL units via an extra transmission link (not shown inFIG. 2) to receiver 28. In this case, gateway 32 removes all NAL unitsof NAL unit types 7 and 8. Of course, it is possible that gateway 32removes any combination of NAL unit types being removable.

If the NAL unit 76 received at step 120 is to be removed, gateway 32performs the removal of the current NAL unit from the data stream 70 anddiscards this NAL unit at step 126. Otherwise, gateway 32 determines atstep 128 as to whether the NAL unit received at step 120 has to betransmitted to the receiver 28 safely or has to be left unchanged. Forexample, if the NAL unit just received is a parameter set NAL unit ithas to be transferred to the receiver 28. In this case, there are twopossibilities for gateway 32. In the first case, gateway 32 decides totransmit the parameter set NAL unit via an extra transmission link. Inthis case, gateway 32 removes, in step 130, the NAL unit from the datastream 70 and, then, transmits, in step 132, the NAL unit via the extratransmission link. In particular, gateway 32 can perform thetransmission of step 132 several times. Gateway 32 just has to complywith the constraints on the order of sequence and picture parameter setRBSPs and their activation at decoder side as mentioned above (see I andII).

Alternatively, gateway 32 decides to transmit NAL units containing theparameter sets in-band. In this case, gateway 32 inserts, at step 134,the current NAL unit at another position of the data stream 70 to bemore precise, at a preceding position of the NAL data stream 70. Ofcourse, step 134 may be performed several times. Gateway 32 thus has toguarantee that the constraints on the order of sequence and pictureparameters at RBSPs and their activation at receiver 28 are obeyed (seeconstraints o and p).

After any of steps 126, 128, 132 and 134, gateway 32 checks, at step136, as to whether there are NAL units left in the data stream 70. Ifthis is the case, the next NAL unit is received at step 120. Otherwise,the process of FIG. 6 and gateway 32 awaits the reception of the nextNAL data stream 70.

In order to illustrate the decoupling of the transmission ofinfrequently changing information from the transmission of codedrepresentations of the values of the samples in the video pictures thesequence and picture parameter set mechanism is illustrated in FIG. 7.FIG. 7 shows the encoder 18 and the receiving decoder 28. The datastream 70 is represented by an arrow. The data stream 70 passed fromencoder 18 to decoder 28 comprises a NAL unit with VCL data that isencoded by means of a parameter set having a pic_parameter_set_id of 3as an address or index in the slice header. As can be seen, the encoder18 has generated several picture parameter sets, wherein in FIG. 7, thepicture parameter set having pic_parameter_set_id 1, 2 and 3,respectively, are shown representatively by small boxes 140. Thetransmission of the parameter set NAL unit is performed via an extratransmission link 142 which is illustrated by an double-headed arrowindicated “reliable parameter set exchange”. In particular, the contentof the picture parameter set having pic_parameter_set_id of 3 is shownat 144 in more detail for illustration purposes. The picture parameterset having pic_parameter_set_id 3 contains information such as the videoformat used, i.e. PIL, and the entropy coding scheme used, such as oneof a context adaptive binary arithmetic coding or a context adaptivevariable length (Huffman) coding. So, the NAL unit with VCL data havingpic_parameter_set_id as an index to the parameter set NAL unit 144 doesnot have to contain all the content of the parameter set NAL unit 144.Therefore, the amount of data contained in the stream 70 can be reduced.As mentioned above, the decoder 28 buffers the incoming parameter setsand indexes same by the pic_parameter_set_id in the current NAL units byuse of the above explained activation mechanism (see I and II).

With respect to FIG. 8, in the following an embodiment for thefunctionality of receiver or decoder 28 is described. At step 160decoder 28 receives a NAL unit 76 of a NAL unit data stream 70 which mayhave been modified by the gateway 32 by the process described withrespect to FIG. 6 relative to the original version of the data stream ascreated by encoder 18. At step 162, the decoder 28 buffers the NAL unit76 in a buffer having a predetermined buffer space exceeding apredetermined standardized minimum buffer size known to the encoder.Next, at step 164, the decoder 28 detects the beginning of a new accessunit. In other words, the decoder 28 checks as to whether the NAL unitjust received at step 160 is the first of a new access unit.

The detection in step 164 is performed by use of the afore-mentionedconstraints on the order of NAL units and coded pictures and theassociation to access units (see a-f). In particular, the decoder 28detects the beginning of a new access unit if the NAL unit received atstep 160 is the first of any of the following NAL units after the lastVCL NAL unit of a primary coded picture of the current access unit:

-   -   Access unit delimiter NAL unit (when present)    -   sequence parameter set NAL unit (when present)    -   picture parameter set NAL unit (when present)    -   SEI NAL unit (when present)    -   NAL units with nal_unit_type in the range of 13 to 18, inclusive    -   first VCL NAL unit of a primary coded picture (always present)

It is noted that the decoder 28 can detect the presence of a last VCLNAL unit of a primary coded picture 100 by means of the assumption thatthe payload data of all the VCL NAL units of the primary coded picture100 have to yield a complete pre-coded version of one picture as well asby means of the constraints mentioned above at (w).

When a new access unit has been detected (step 166), the decoder 28deallocates or flushes buffer space at step 168 by removing an oddaccess unit stored in the buffer. Thereupon, the decoder 28 makesavailable the picture derived from the current access unit, i.e. theaccess unit which precedes the new access unit, just detected in step164.

Otherwise, i.e. if no new access unit has been detected (step 166), orafter step 170, the decoder 28 decodes the NAL unit received at step 160in order to receive the syntax elements contained therein.

The process then loops back to step 160. As may have become clear fromthe foregoing description, the decoder 28 is not liable to a bufferoverflow as long as (1) the encoder 18 has created an NAL unit datastream 70 with access unit sizes that comply with the maximum buffersize and (2) gateway 32 lets the data stream 70 unchanged, merelyremoves and discards removable and negligible NAL units from the datastream 70, merely removes removable but essential NAL units from thedata stream 70 with transmitting them via an extra transmission link or,alternatively, inserts NAL units merely in access units so that theresulting access unit size does not result in an buffer overflow atdecoder's side. Anyway, by the above-described constraints on thecreation of the data stream, the decoder 28 is in any way capable ofdetecting the beginning of a new access unit in an unitary and exactway. Therefore, it is possible for the encoder 18 and the gateway 32 toforecast the buffer space consumption at decoder side and, therefore, toavoid buffer spacer overflow, provided the decoder has the minimumamount of buffer space.

As may be clear from the above, the present invention is not restrictedto multimedia, video or audio signals. Moreover, it is noted withrespect to FIG. 2, that other constellations in which the presentinvention could be used are also possible. For example, more than onegateway 32 could be interposed between the data stream presentation(encoder) and the decoder. With respect to FIG. 6 it is noted, that thegateway 32 does not have to influence all of the options shown in FIG.6. For example, a gateway could be designed to implement merely theremoval of NAL units from the data stream without implementing steps 128to 134. Alternatively, a gateway could implement all steps of FIG. 6except step 134 or all steps except 130 and 132.

With regard to decoder of FIG. 8, it is noted that the bufferingmanagement described there helps in standardizing the data stream formatof the data stream 70 of that embodiment. Nevertheless, the buffermanagement may be realized in a different way, for example withde-allocating buffer space in other units than access units.

In other words, in accordance with the above embodiments each syntaxelement is placed into a logical packet called a NAL unit. Rather thanforcing a specific bitstream interface to the system as in prior videostandards, the NAL unit syntax structure allows greater customization ofthe method of carrying the video content in a manner appropriate foreach specific network. In particular, the above embodiment defines howNAL units are to be ordered within in access units. The constraintsformulated on the order of NAL units specify the decoding order thatmust by accepted by an standard-conform decoder allowing a novel degreeof freedom. Moreover, the ordering of the NAL units and theirarrangement specifies access units and makes the distinction betweenvarious access units possible even if NAL units that are allowed to beremoved from the bitstream are removed.

The above embodiments permit, by their new way of defining the decodingorder, an increased degree of flexibility that is especially importantin internet applications where each NAL unit is typically transported inone packet and shuffling is likely to occur. This permits simplerdecoder implementations.

The distinction between various access units even if units that areallowed to be removed from the bitstream are removed permits a flexiblerate shaping and transcoding of data and makes the method robust thetransmission errors. The automatic distinction method also increasescoding efficiency by making start codes or delimiter codes superfluous.

Depending on an actual implementation, the inventiveencoding/decoding/converting methods can be implemented in hardware orin software. Therefore, the present invention also relates to a computerprogram, which can be stored on a computer-readable medium such as a CD,a disk or any other data carrier. The present invention is, therefore,also a computer program having a program code which, when executed on acomputer, performs the inventive method of encoding, converting ordecoding described in connection with the above figures.

While this invention has been described in terms of several preferredembodiments, there are alterations, permutations, and equivalents whichfall within the scope of this invention. It should also be noted thatthere are many alternative ways of implementing the methods andcompositions of the present invention. It is therefore intended that thefollowing appended claims be interpreted as including all suchalterations, permutations, and equivalents as fall within the truespirit and scope of the present invention.

Furthermore, it is noted that all steps indicated in the flow diagramsare implemented by respective means in the encoder, gateway or decoder,respectively, an that the implementations may comprise subroutinesrunning on a CPU, circuit parts of an ASIC or the like.

1. A process for coding a video or audio signal, comprising the stepsof: processing the video or audio signal to produce data streamrepresenting a coded version of said video or audio signal, the datastream comprising consecutive data packets, each data packet being of adata packet type of a predetermined set of data packet types, at leastone of the data packet types being a removable data packet type; andarranging the data packets within the data stream so that the datastream comprises consecutive access units of consecutive ones of thedata packets, so that, within each access unit, the data packetsbelonging to a respective access unit are arranged in accordance with apredetermined order among the data packet types, in accordance withwhich a second data packet type precedes a first data packet type, suchthat, even when a data packet of the removable data packet type isremoved from the data stream, borders between successive access unitsare still detectable from the data stream by use of the predeterminedorder and all data packets remain associated with the respective accessunit they originally belonged to before removal of any data packet ofthe removable data packet type, with the detection by use of thepredetermined order involving detecting an existence of a border betweentwo successive access units each time a data packet of the first datapacket type precedes a data packet of the second data packet type,wherein the steps are performed by circuit parts of an integratedcircuit, or by a computer running a computer program of instructionsimplementing, during execution by the computer, the steps.
 2. Theprocess of claim 1, wherein the processing and arranging are performedfurther such that even when a data packet of the removable data packettype is removed from the data stream, the data stream is stillconsistent with predetermined parsing rules for parsing the data stream.3. The process in accordance with claim 1, wherein each data packetcomprises a type number being indicative of which data packet type samedata packet is.
 4. The process of claim 1, wherein the data packet ofthe removable data packet type further comprises payload data.
 5. Theprocess of claim 1, wherein all data packet types whose data packets arenot absolutely necessary for retrieval of the information signal areremovable data packet types.
 6. The process of claim 1, wherein at leastone removable data packet type is a negligible data packet type, withdata packets of that type not being necessary for retrieval of theinformation signal from the data stream.
 7. The process of claim 1,wherein the at least one removable data packet type is an essential datapacket type, with data packets of that type being necessary forretrieval of the information signal from the data stream, and beingassociated with an identifier, wherein at least one data packet of theother data packets comprises the identifier.
 8. The process of claim 1,wherein the predetermined set of data packet types further comprises atleast one non-removable data packet type.
 9. The process of claim 8,wherein the predetermined order at least defines as to whether datapackets of the removable data packet type has to precede or have tofollow data packets of the non-removable data packet type within anaccess unit.
 10. The process of claim 1, wherein the processing andarranging are further performed such that each access unit comprises atleast one non-removable data packet.
 11. The process of claim 1, whereineach access unit is assigned to a different time portion of theinformation signal.
 12. The process of claim 1, wherein the processingand arranging are further performed such that more than one data packetof a same data packet type belong to one access unit.
 13. The process ofclaim 1, wherein the information signal comprises a video, and theprocessing and arranging are further performed such that the data packettypes which are arranged according to the predetermined order comprise asupplemental enhancement information data packet type, comprisingsupplemental enhancement information including timing information orsupplemental data that enhances usability of a version of theinformation signal obtained by decoding the successive access units butare not necessary for obtaining the version of the information signal bydecoding the successive access units; and a coded picture data packettype comprising syntax elements of slice header data and/or syntaxelements concerning slice transform coefficients of one or more slicesof a picture of the video, with, according to the predetermined order,data packets of the supplemental enhancement information data packettype preceding data packets of the coded picture data packet type. 14.The process of claim 1, wherein the information signal comprises avideo, and the processing and arranging are further performed such thatthe data packet types which are arranged according to the predeterminedorder comprise a sequence parameter set data packet type comprisingsequence parameter sets which apply to a series of consecutive picturesof the video; and a picture parameter set data packet type comprisingpicture parameter sets which apply to one or more individual pictures ofthe video within a series of consecutive pictures of the video, a codedpicture data packet type comprising syntax elements of slice header dataand/or syntax elements concerning slice transform coefficients of one ormore slices of a picture of the video, with, according to thepredetermined order, data packets of the sequence parameter set datapacket type and data packets of the picture parameter set data packettype preceding a last data packet of the coded picture data packet typewithin an access unit to which same belong.
 15. The process of claim 14,wherein either one or both of data packets of the sequence parameter setdata packet type and picture parameter set data packet type are conveyedseparate from the data stream by an extra transmission link such that,according to the predetermined order, data packets of the sequenceparameter set data packet type and data packets of the picture parameterset data packet type precede the last data packet of the coded picturedata packet type within the same access unit to which same belong. 16.The process of claim 1, wherein the processing and arranging are furtherperformed such that a resulting access unit size of the access unitsdoes not result in an buffer overflow at a decoder's side, byforecasting a buffer space consumption at the decoder's side at anassumption that a minimum amount of buffer space is available at thedecoder's side and that buffered data packets are discarded from thebuffer space at the decoder's side access unit-wise.
 17. The process ofclaim 15, wherein the processing and the arranging are further performedsuch that a resulting access unit size of the access units does notresult in an buffer overflow at a decoder's side, by forecasting abuffer space consumption at the decoder's side at an assumption that aminimum amount of buffer space is available at the decoder's side andthat buffered data packets are discarded from the buffer space at thedecoder's side access unit-wise.